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Purpose: Refractive error is a complex trait with multiple genetic and environmental risk factors, and is the most com- 
mon cause of preventable blindness worldwide. The common nature of the trait suggests the presence of many genetic 
factors that individually may have modest effects. To achieve an adequate sample size to detect these common variants, 
large, international collaborations have formed. These consortia typically use meta-analysis to combine multiple studies 
from many different populations. This approach is robust to differences between populations; however, it does not com- 
pensate for the different haplotypes in each genetic background evidenced by different alleles in linkage disequilibrium 
with the causative variant. We used the Age-Re lated Eye Disease Study (AREDS) cohort to replicate published significant 
associations at two loci on chromosome 15 from two genome-wide association studies (GWASs). The single nucleotide 
polymorphisms (SNPs) that exhibited association on chromosome 15 in the original studies did not show evidence of 
association with refractive error in the AREDS cohort. This paper seeks to determine whether the non-replication in 
this AREDS sample may be due to the limited number of SNPs chosen for replication. 

Methods: We selected all SNPs genotyped on the Illumina Omni2.5vl_B array or custom TaqMan assays or imputed 
from the GWAS data, in the region surrounding the SNPs from the Consortium for Refractive Error and Myopia study. 
We analyzed the SNPs for association with refractive error using standard regression methods in PLINK. The effective 
number of tests was calculated using the Genetic Type I Error Calculator. 

Results: Although use of the same SNPs used in the Consortium for Refractive Error and Myopia study did not show 
any evidence of association with refractive error in this AREDS sample, other SNPs within the candidate regions 
demonstrated an association with refractive error. Significant evidence of association was found using the hyperopia 
categorical trait, with the most significant SNPs rsl357179 on 15ql4 (p=1.69xl(T 3 ) and rs7164400 on 15q25 (p=8.39><10~ 4 ), 
which passed the replication thresholds. 

Conclusions: This study adds to the growing body of evidence that attempting to replicate the most significant SNPs 
found in one population may not be significant in another population due to differences in the linkage disequilibrium 
structure and/or allele frequency. This suggests that replication studies should include less significant SNPs in an as- 
sociated region rather than only a few selected SNPs chosen by a significance threshold. 



Refractive error (RE) is the leading cause of preventable 
blindness, with large societal, economic, and public health 
implications. Around 25% of U.S. adults are myopic [1,2], 
and in some parts of Southeast Asia, the prevalence is now 
in excess of 70% among teens [3,4] and young adults [5]. In 
addition to the personal impact of the costs of eyeglasses, 
contact lenses, or refractive surgery, high-grade myopia 
increases the risk of other ocular problems such as retinal 
degeneration, cataracts, glaucoma, and choroidal neovascu- 
larization [6]. 

As part of an international effort to characterize the 
risk factors responsible for refractive errors and the recent 
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increase in prevalence observed in many countries and 
populations, environmental risk factors are receiving needed 
attention in addition to genetic influences. Twin studies and 
family aggregation studies estimate the heritability of refrac- 
tive errors to be on the order of 50%-90% [7-10]. Two recent 
genome-wide association studies (GWASs) identified strong 
association with refractive error in two locations on chromo- 
some 15. Solouki et al. [11] reported an association on 15ql4 
that was subsequently replicated in several other populations 
[12,13]. Hysi et al. [14] published a second locus on 15q25 
at the same time. The Consortium for Refractive Error and 
Myopia (CREAM) recently performed a large meta-analysis 
of both loci in 31 population cohorts [15] and replicated the 
15ql4 locus only. However, the single nucleotide polymor- 
phisms (SNPs) in this region did not replicate robustly for 
each cohort. The Age-Related Eye Disease Study (AREDS) 
did not significantly contribute to the association signal on 
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both chromosome 15 loci. We hypothesized that the choice 
of replicating 14 SNPs on 15ql4 and five SNPs on 15q25 was 
too narrow (henceforth referred to as "CREAM replication 
SNPs"). The approach of narrowly selecting SNPs for repli- 
cating association signals assumes that all populations with 
a true signal in the region have the same SNPs associated 
with the trait. Given the heterogeneous nature of refractive 
error and the different patterns of linkage disequilibrium 
across populations, this method may not reflect the associa- 
tion strength in each population. Although Verhoeven and 
colleagues [15] mentioned that the tested SNPs had similar 
allele frequencies across the populations in the study, the 
chosen SNPs may not adequately capture the underlying 
haplotypes in every population. 

Various authors have suggested that regional replication 
of GWAS signals is more appropriate than simply selecting 
the most significant SNP from each region. Recently, in 
a replication study of fasting plasma glucose in African 
Americans, Ramos et al. [16] showed that local replication of 
a candidate region (performed by querying a 500 kb window 
centered on all 29 SNPs that were associated with the trait) 
resulted in detection of new significantly associated SNPs. 
This result confirmed Yang et al.'s finding [17] that varia- 
tion due to a specific locus will be underestimated if only 
the most significant SNP in the region is selected. Asimit 
et al. [18] reported that the current choices for SNP replica- 
tion detect only a small proportion of causal variants and 
are insufficiently powered. In our study, we show that in the 
AREDS data additional SNPs located nearby have a stronger 
association than the originally chosen CREAM replication 
SNPs [16,17,19-21]. 

METHODS 

Participants: The AREDS cohort was initially designed as 
a long-term, multicenter, prospective study to assess the 
clinical course of age-related macular degeneration (AMD) 
and age-related cataract [22]. In addition to collecting natural 
history data, the AREDS included a randomized clinical trial 
of high-dose vitamin and mineral supplements for AMD 
and a clinical trial of high-dose vitamin supplements for 
cataract [22-24]. Before the study was initiated, the protocol 
was approved by an independent data and safety monitoring 
committee and by the institutional review board for each clin- 
ical center. Written informed consent was obtained from all 
participants in accordance with the Declaration of Helsinki. 
AREDS participants were 55 to 80 years of age at enroll- 
ment and had to be free of any illness or condition that would 
make long-term follow-up or compliance with study medica- 
tions unlikely or difficult. Visual acuity measurement of all 
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participants was performed with the electronic visual acuity 
tester (EVA) using the electronic early treatment diabetic 
retinopathy study (E-ETDRS) visual acuity testing protocol. 
A refraction measurement was performed for participants 
with visual acuity of fewer than 74 letters in each eye at the 
initial visit and all participants at the randomization visit. 
For the current analysis, a subset of the control group from 
the original AREDS cohort was included: 2,000 Caucasian 
participants aged 60 and older who did not have AMD and 
were further screened to exclude individuals with cataracts, 
retinitis pigmentosa, color blindness, other congenital eye 
problems, LASIK, artificial lenses, and other eye surgery. 
Refractive error at baseline enrollment in the AREDS [22-25] 
was analyzed, taking the mean spherical equivalent (MSE) 
across both eyes (or spherical equivalent in a single eye when 
both eyes were not measured) as the trait of interest. Age, 
gender, and the first three principal components (to adjust for 
significant population stratification) were included as covari- 
ates. Appendix 1 shows the clinical characteristics and the 
number of cases with high, moderate, and mild myopia and 
the number of cases with high or moderate hyperopia. Since 
the number of individuals in these subcategories was small, 
separate analysis of each subcategory was not performed due 
to inadequate power. 

Single nucleotide polymorphism selection and genotyping: 
DNA was obtained from the Coriell Institute (Camden, NJ) 
and was genotyped using a genome-wide Illumina SNP array 
(2.5 million SNPs) and custom SNPs. 

Illumina chip array: All participants were genotyped at the 
Center for Inherited Disease Research (CIDR) using the 
Illumina HumanOmni2.5-4vl_B chip array (San Diego, 
CA). Genotype and phenotype data from the AREDS cohort 
are publicly available through the Genotype and Phenotype 
(dbGaP) database under the name of either the "Michigan, 
Mayo, AREDS, Pennsylvania (MMAP) study" or the 
AREDS. Initially, all SNPs in a window of 100 kb on either 
side of the discovery SNP were chosen for analysis. In the 
15q25 region, this window was extended to 350 kb to provide 
better coverage in this region. 

Custom single nucleotide polymorphisms: Tagging SNPs 
from the 15ql4 and 15q25 regions were selected in HapMap 
release 28 Phasell+III on Human genome NCBI B36 
assembly, dbSNP bl26. The HapMap genotype data were 
imported into Haploview 4.2 to obtain the r-square (r 2 ) 
linkage disequilibrium (LD) plot. Selection of tagging SNPs 
was limited to those with a minor allele frequency of 5% 
in the CEU HapMap sample. The tagging SNPs on 15ql4 
and 15q25 were centered on the golgin A8 family, member 
B (GOLGA8B), gap junction protein, delta 2 (GJD2), actin, 
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alpha, cardiac muscle 1 (ACTC1), and Ras protein-specific 
guanine nucleotide-releasing factor 1 (RASGRF1) genes 
and were designed to provide better coverage of these 
regions than was available from the SNPs on the Illumina 
HumanOmni2.5-4vl_B chip array alone. 

All 23 SNPs were genotyped using a customized 
TaqMan SNP genotyping assay (Applied Biosystems [ABI], 
Foster City, CA). All PCR amplifications were performed 
with the following thermal cycling conditions: 95 °C for 10 
min followed by 40 cycles of 95 °C for 15 s and 60 °C for 
1 min. PCR reactions were performed with TaqMan Geno- 
typing Master Mix (ABI) in either a GeneAmp PCR System 
9700 (ABI) or a 7900HT Fast Real-Time PCR System (ABI). 
All pre- and post-PCR plate readings were performed on a 
7900HT Fast Real-Time PCR System (ABI), and the allele 
types were confirmed by the system's software (7900HT Fast 
Real-Time PCR System SDS software version 2.3; ABI). 

Illumina HumanOmni2.5-4vl_B chip array: All participants 
were genotyped at the Center for Inherited Disease Research 
(CIDR) using the Illumina HumanOmni2.5-4vl_B chip 
array. Genotype and phenotype data from the AREDS are 
publicly available through the Genotype and Phenotype 
(dbGaP) database under the name of either the MMAP study 
or the AREDS. 

Quality control: 

Illumina Omni2.5 array — All genotypes were 
imported into PLINK [26], and individuals were removed if 
they had a low genotyping rate (>1%), known cryptic related- 
ness, chromosomal abnormalities (described elsewhere [27]), 
or non-Caucasian ancestry according to principal components 
analysis [27]. The final genotyping rate in the remaining 
1,877 individuals was 0.999. 

TaqMan assays: All genotypes were imported into PLINK, 
and individuals were removed for a low genotyping rate 
(<20%) and the same filtering criteria described for the Illu- 
mina SNP panel. The final genotyping rate in the remaining 
1,879 individuals was 0.997. 

Statistical analysis: First, RE was analyzed as the MSE of 
both eyes using linear regression in PLINK. Age, gender, and 
education, plus three principal components, were included 
as covariates in the analysis. Myopia and hyperopia were 
analyzed as a categorical trait using logistic regression in 
PLINK along with the covariates age, gender, education, 
and the three principal components. For myopia, affected 
individuals were defined as an MSE of -ID or worse and 
controls as an MSE of 0D or greater. Individuals with an MSE 
of between -ID and 0D were coded as unknown. Hyperopia 



© 2013 Molecular Vision 

was defined as an MSE of +1D or greater, controls were 
defined as an MSE of 0D or less, and individuals between 0 
and +1 were coded as unknown. 

Calculation of effective number of tests and significance 
thresholds: For the 15ql4 and 15q25 regions, we tested the 
reported discovery SNPs, the replication SNPs from the 
Verhoeven et al. replication study, our custom SNPs, and all 
additional SNPs available from our Illumina chip that were 
within 100 kb from the original discovery SNP for 15ql4 
and 350 kb for 15q25. We used Ramos et al.'s [16] method 
to calculate the number of effective tests () and divided by 
a to calculate the significance threshold. The Ramos et al. 
method calculates intermarker LD as part of the correction 
procedure. The markers in this region are densely spaced, 
and there are only a few LD blocks across each region, 
with high levels of LD between the markers in each block 
(Appendix 2 and Appendix 3). Given the number of SNPs in 
the replication regions, even significant p values would not 
survive Bonferroni correction, but such an adjustment would 
be extremely over-conservative given the large amount of 
intermarker LD in this region. Adjusting for multiple testing 
using false discovery rate methods may be less conservative 
than Bonferroni correction, but still does not account for the 
extreme non-independence of all the SNPs in the region. The 
Ramos et al. method was developed to deal with this issue, by 
calculating the pairwise LD in the sample, and calculating the 
effective number of tests based on the correlation structure 
of the markers. In Appendix 4, which is a zoom of the right- 
hand side of SF1, the CREAM replication SNPs for 15ql4 
are found in the region marked with a green rectangle; due 
to the way Haploview displays densely packed SNPs, two 
rectangles are included, one delineating the SNPs separated 
by relative spacing above and a second rectangle below where 
even spacing is used for the LD display. In Appendix 5, a 
zoom of the left-hand side of SF2, only one rectangle has 
been included as the SNP density is lower, representing the 
region where the CREAM replication SNPs are located for 
the 15q25 locus. These figures provide a clear visualization 
of the fact that the number of truly independent SNPs tested 
in the current replication analysis (the effective number of 
SNPs calculated by the Ramos et al. method) is consider- 
ably smaller than the actual number of SNPs used for the 
replication. 

Power calculations: Power calculations were performed using 
Quanto [28]. All power calculations assumed that the effect 
sizes reported by Solouki et al. [11] and Hysi et al. [14] were 
correct, and not inflated due to the winner's curse. 
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RESULTS 

Power calculations: For each locus, we had good power for 
detecting association for the quantitative trait expressed as 
the MSE with a variant in this sample at the p=0.002 level 
(which corrects for the number of independent loci as detailed 
in the Methods and trait results) across a range of minor allele 
frequencies. We had power between 60% and 78% for allele 
frequencies between 0.3 and 0.5 on 15ql4 and above 80% for 
allele frequencies of 0.2 to 0.5 for 15q25. For the qualitative 
trait myopia (< -ID), the power was above 80% for allele 
frequencies of 0.05 to 0.5 for 15ql4 and between 60% and 
77% for allele frequencies between 0.25 and 0.5 for 15q25. 
For hyperopia (> +1D), the power was above 70% for allele 
frequencies above 0.1 for 15ql4, and the power for 15q25 was 
below 30% for allele frequencies above 0.1. Power calcula- 
tions assumed 1) an additive quantitative trait locus (QTL) 
effect size (P) and discrete trait odds ratio (OR) based on 
the reported values from each paper: P=-0.27 and OR=1.41 
[11] for the 15ql4 locus and P=-0.35 and OR=1.16 [14] for 
the 15q25 locus, 2) minor allele frequencies across a range 
0.05-0.5, and 3) complete LD between the marker SNP and 
the causal variant (D'=l). Full details of the power calcula- 
tions are available in Appendix 1. 

Significance threshold for 15ql4 and 15q25: A total of 1,337 
and 1,224 SNPs were tested in the 15ql4 and 15q25 regions, 
respectively (the most significant discovery SNP, the other 
SNPs used for replication in the CREAM replication study, 
our custom SNPs, and the additional SNPs in each region 
that were available from our Illumina 2.5M chip and were 
within 100 kb (15ql4) or 350 kb (15q25) of the most signifi- 
cant discovery SNP in the region). The significance threshold 
calculated using Ramos et al.'s method was 



N eff 24.65 
for the 15ql4 region and was 

N,„ 25.43 

for the 15q25 region. 

Mean spherical equivalent as a quantitative trait: There 
were 1,877 individuals available for analysis, with an MSE 
of +0.56D (standard deviation^. 15). For myopia, 346 indi- 
viduals met the criteria for cases, and 1,333 were controls. 
For hyperopia, a total of 858 individuals met the criteria for 
inclusion as cases, and 602 as controls. 

15ql4 results: None of the SNPs from the CREAM replication 
study of the 15ql4 region were even nominally significantly 
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associated with mean MSE in our data set (Table 1). The 
most significant discovery SNP, rs634990 (p=0.20), was not 
replicated (Table 1). Of the additional SNPs we genotyped, 
none of these achieved nominal significance in the 15ql4 
region (Table 2). When all SNPs available from the Illumina 
array that were within 100 kb of the discovery SNP rs634990 
are included, the most significant SNP was 7.38 x 10~ 3 , which 
is 7.26 kb away from the closest originally associated SNP 
picked for the replication study (Figure 1 and Appendix 1). 

15q25 results: None of the chosen replication SNPs from the 
CREAM replication study or our custom genotyped SNPs 
for the 15q25 region were even nominally significantly asso- 
ciated with mean SEM in our data set (Table 1). The most 
significant of these SNPs was rs7183668 (p=0.03; Table 2). 
Including all SNPs available from the Illumina array within 
350 kb of rs8027411, the most strongly associated SNP was 
rs2002832 (p=3.49xl0~ 3 ), which is 17.7kb from the closest 
originally associated SNP picked for the replication study 
(Figure 2 and Appendix 1). 

15ql4 results: As seen with MSE, none of the CREAM repli- 
cation SNPs or the custom TaqMan SNPs were even nomi- 
nally significant. The most significant of these SNPs was 
rs533021 (p=0.07; Table 2). Inclusion of all SNPs from the 
Illumina array that were within 100 kb of rs634990 showed 
several SNPs with some evidence of association (Appendix 
1). The strongest signal was from rs893132 (p=2.8 x 10~ 3 ), 
which is close to but not below our replication threshold of 
0.002, but several SNPs had similar values (Figure 3 and 
Appendix 1). 

15q25 results: No SNPs were even nominally significantly 
associated with myopia in the CREAM replication SNPs. 
In the custom SNP genotyping, rs7183668 was only slightly 
more significant for myopia than the MSE (myopia p=0.09 
versus MSE p=0.1). The remaining SNPs from the array 
showed some evidence of association, with several SNPs 
having p values of the order of 10~ 3 although none were below 
0.002 (Figure 4 and Appendix 1). 

Hyperopia results: A total of 858 individuals met the criteria 
for inclusion as hyperopes, and 602 were controls. 

15ql4 results: Ten of the 14 CREAM replication SNPs were 
nominally significant in this analysis, with the strongest 
signal coming from rs7176510 (p=0.01; Table 1). The addi- 
tional genotyped SNPs were not significant (Table 2). A full 
analysis of all SNPs in the region from the Illumina array 
within 100 kb of rs634990 showed a more consistent signal, 
with a cluster of SNPs close to the CREAM replication SNPs 
with much more significant p values (Figure 5). The most 
significant SNP was rsl357179 (p=1.69xl0~ 3 ), which passed 
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Figure 1. 15ql4 replication single 
nucleotide polymorphisms from 
the Consortium for Refractive 
Error and Myopia replication study 
(circles), additional genotyped 
single nucleotide polymorphisms 
(triangles), and all single nucleotide 
polymorphisms from the genome- 
wide association study array 
(diamonds) for the mean spherical 
equivalent. The replication signifi- 
cance threshold is -log 10 (p)=2.7. 



the significance threshold of 0.002 for replication even in this 
small sample (Appendix 1). 



15q25 results: All the CREAM replication SNPs were not 
even nominally significant. Of the additional custom geno- 
typed SNPs, one SNP (rs7183668, p=0.03) achieved nominal 
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Figure 2. 15ql4 replication single 
nucleotide polymorphisms from 
the Consortium for Refractive 
Error and Myopia replication study 
(circles), additional genotyped 
single nucleotide polymorphisms 
(triangles), and all single nucleo- 
tide polymorphisms from the 
genome-wide association study 
array (diamonds) for myopia. The 
replication significance threshold 
is -log 10 (p)=2.7. 
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Figure 3. 15q25 replication single 
nucleotide polymorphisms from 
the Consortium for Refractive 
Error and Myopia replication study 
(circles), additional genotyped 
single nucleotide polymorphisms 
(triangles), and all single nucleo- 
tide polymorphisms from the 
genome-wide association study 
array (diamonds) for myopia. The 
replication significance threshold 
is -log 10 (p)=2.7. 



significance at p=0.03 (Table 2), but this did not pass the 
replication significance threshold of 0.0019. Analysis of the 
full set of available genotypes in the region (within 350 kb 
of rs8027411) identified a cluster of SNPs within 0.3 Mb of 



the CREAM replication study SNPs that were more signifi- 
cant (Figure 6). The most significant SNP was rs7 164400 
(p=8.39 x 10~ 4 ), which passed the replication threshold 
(Appendix 1). 
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Figure 4. 15q25 replication single 
nucleotide polymorphisms from 
the Consortium for Refractive 
Error and Myopia replication study 
(circles), additional genotyped 
single nucleotide polymorphisms 
(triangles) and all single nucleotide 
polymorphisms from the genome- 
wide association study array 
(diamonds) for the mean spherical 
equivalent. The replication signifi- 
cance threshold is -log 10 (p)=2.7. 
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Figure 5. 15ql4 replication single 
nucleotide polymorphisms from 
the Consortium for Refractive 
Error and Myopia replication study 
(circles), additional genotyped 
single nucleotide polymorphisms 
(triangles), and all single nucleotide 
polymorphisms from the genome- 
wide association study array 
(diamonds) for hyperopia. The 
replication significance threshold 
is -log 10 (p)=2.7. 



Conclusions: Early GWAS designs were frequently under- 
powered and did not adequately control for type I error. The 
need to address these issues has led to large consortia being 
formed so that the studies would have sufficient sample sizes 



for discovery and replication of results. As these consortia 
have increased in size, the diversity of the populations added 
has also increased. Controlling for population stratification 
within populations and using meta-analysis to deal with 
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Figure 6. 15q25 replication single 
nucleotide polymorphisms from 
the Consortium for Refractive 
Error and Myopia replication study 
(circles), additional genotyped 
single nucleotide polymorphisms 
(triangles), and all single nucleotide 
polymorphisms from the genome- 
wide association study array 
(diamonds) for hyperopia. The 
replication significance threshold 
is -log 10 (p)=2.7. 
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between-population differences in sample size and different 
tagging marker allele frequency allow us to control for chance 
associations. However, the problems of allelic heterogeneity 
and different patterns of linkage disequilibrium and haplo- 
types remain. Our approach of querying a large number of 
SNPs that are within 100 kb (15ql4) or 350 kb (15q25) of 
the most significant SNP from the discovery study helps to 
ameliorate the loss of power due to different LD patterns in 
the associated region across the populations. 

Recent GWASs have identified several loci in European 
populations that confer susceptibility to RE, and efforts to 
replicate these loci have met with some success for the 15ql4 
locus. However, the AREDS cohort did not even nominally 
replicate the SNPs queried in the CREAM replication study 
except hyperopia at the 15ql4 locus (p=0.03). Additional 
genotyping of a few additional tagging SNPs in the region 
yielded only nominally significant results. However, selecting 
a denser panel of SNPs from an expanded region around the 
candidates (all candidate region SNPs that were available on 
the Illumina HumanOmni 2.5 array) revealed the association 
of several SNPs not genotyped in the original Rotterdam 
study or included in the CREAM replication effort. These 
SNPs are actually close to the SNPs reported as significant 
in the CREAM replication study. Calculating the effective 
number of tests reveals that several SNPs approached or 
exceeded the significance thresholds for replication even in 
this small sample. 

The AREDS cohort study is a United States-wide collec- 
tion of Caucasian individuals who have a wide variety of 
European backgrounds and possibly low levels of admixture 
with African and indigenous populations (low enough to be 
undetectable by analyses for population stratification). Allele 
frequency clines exist in Europe, and it is possible to map 
quite accurately a European's exact country of origin based 
on a small number of informative markers [29]. Indeed, our 
principal components analyses of the entire set of GWAS 
markers showed significant evidence of a population substruc- 
ture [27]. These analyses [15] analyses of the AREDS data 
corrected for several principal components and clear outlier 
individuals were removed, but some subtle substructure may 
still exist. However, we suspect that the most likely reason for 
the different results from the AREDS data and some of the 
other data sets that strongly replicated the "CREAM replica- 
tion SNPs" association in [15] is the difference in how the 
"CREAM replication SNPs" tag the other SNPs in the region 
that were not analyzed. Thus, the more mixed background of 
the AREDS cohort compared to the Rotterdam and TwinsUK 
studies may account for the inability of the AREDS sample 
to replicate the chosen SNPs at even the 0.05 significance 
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threshold in the large CREAM meta-analysis of the 15ql4 
region [30]. Our power calculations suggest that we had suffi- 
cient power to detect association (at p=0.002) for the CREAM 
replication SNPs. Indeed, when we queried a much denser 
panel of SNPs within the candidate region, we found stronger 
evidence of association with each trait and had significant 
evidence of replication for hyperopia. Of course, the moderate 
size of this study leaves open the possibility that association 
with other SNPs in this region was not detected due to lack of 
power. However, this study illustrates what multiple authors 
have pointed out: That is, that regional replication is a more 
powerful approach than merely picking a few SNPs to repli- 
cate, particularly when the small number of SNPs picked for 
replication may not tag the underlying haplotypes (and thus 
the ungenotyped true causal SNP) in the same manner across 
different data sets. This paradox suggests that testing a small 
number of significant SNPs for replication from an associ- 
ated region could lead to non-replication in the replication 
cohort. Ioannidis et al. [31] pointed out that when only one 
or a few of the most significant SNPs from an associated 
region are included in the follow-up set, the selected SNP(s) 
are not necessarily more informative or closer to the causal 
variant. They suggest that using only one or a few SNPs for 
replication leads to less robust information for those regions 
and may result in failure of the replication. They propose that 
combining complete GWASs in a meta-analysis is a more 
fruitful approach than attempting to replicate only one or a 
few significant SNPs from a candidate region. Asimit et al. 
[18] also supported the idea of combining a large number of 
cohorts in a meta-analysis as a method for improving power. 
Several analysis approaches that incorporate multiple SNPs 
have been published [32-39], as well as approaches that incor- 
porate linkage information [40] and pathway-based associa- 
tion approaches [41]. Christoforou et al. [39] proposed using a 
LD-based binning strategy to interpret and compare multiple 
GWASs, an approach that may prove to be the most fruitful. 
Still, issues surround handling SNPs, which map to multiple 
and sometimes overlapping genes, and correlations between 
genes and derivative gene scores that need to be resolved. In 
the meantime, when attempting to replicate significant asso- 
ciations, studying a denser panel of SNPs from the associated 
region may be more powerful [16-18], and imputation to the 
HapMap and/or the 1,000 Genomes data can help provide 
information on genotypes at the same markers across studies 
even if they used different GWAS genotyping platforms. 

Whole-exome and whole-genome sequencing studies 
will produce data on more variants than ever before, many 
of them individually quite rare. The temptation to study only 
the variants that have been genotyped in all the member 
cohorts of a consortium will be strong. However, in traits 
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where alleles of modest effect are sought, this approach 
misses many variants of interest. Many techniques are being 
developed to combine data from multiple SNPs, from the 
various collapsing methods for rare variants to gene-based 
and pathway-based approaches. However, no robust method 
has yet emerged for combining these results across heteroge- 
neous populations. Moving away from cohorts of unrelated 
individuals to family studies may help address some of these 
issues. Refocusing efforts on linkage, which is robust to 
allelic heterogeneity, could assist with detecting rare vari- 
ants with large effects. For existing consortia, the challenge 
is to find a method that can adequately account for allelic and 
haplotypic heterogeneity, while still controlling type I error. 

APPENDIX 1. SUPPLEMENTARY TABLES FROM 
S1-S8. 

To access the data, click or select the words "Appendix 1." 

APPENDIX 2. LINKAGE DISEQUILIBRIUM 
STRUCTURE IN THE 15Q14 REGION IN THE 
AREDS COHORT. 

To access the data, click or select the words "Appendix 2." 

APPENDIX 3. LINKAGE DISEQUILIBRIUM 
STRUCTURE IN THE 15Q25 REGION IN THE 
AREDS COHORT. 

To access the data, click or select the words "Appendix 3." 

APPENDIX 4. ZOOMED IN SECTION OF 
LINKAGE DISEQUILIBRIUM IN THE 15Q14 
REGION TO SHOW LOCATION OF ORIGINAL 
CREAM REPLICATION SNPS (GREEN BAR). 

To access the data, click or select the words "Appendix 4." 

APPENDIX 5. ZOOMED IN SECTION OF 
LINKAGE DISEQUILIBRIUM IN THE 15Q25 
REGION TO SHOW LOCATION OF ORIGINAL 
CREAM REPLICATION SNPS (GREEN BAR). 

To access the data, click or select the words "Appendix 5." 
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